动态场景图表形式的结构化视频表示是有关多个视频理解任务的有效工具。与场景图的任务相比,由于场景的时间动态和预测的固有时间波动,动态场景图生成是更具挑战性。我们表明捕获长期依赖性是有效生成动态场景图的关键。我们通过从视频中构造一致的长期对象轨迹来介绍检测跟踪 - 识别范例,然后是捕获对象和视觉关系的动态。实验结果表明,我们的动态场景图检测变压器(DSG-DETR)在基准数据集动作基因组上的显着余量优于最先进的方法。我们还进行消融研究并验证所提出的方法的每个组成部分的有效性。
translated by 谷歌翻译
我们在大图中介绍了图形神经网络(GNNS)的分布式全批量培训的顺序聚合和换算(SAR)方案。最近,GNN的大规模培训是基于非学习消息传递的基于采样的方法和方法主导的。另一方面,SAR是一种分布式技术,可以直接在整个大图上培训任何GNN类型。 SAR中的关键创新是分布式顺序修补方案,其在后向通过期间依次重新构造,然后在后向通行证期间释放禁止的大型GNN计算图。这导致优异的记忆缩放行为,其中每个工作人员的内存消耗与工人的数量线性地下降,即使对于密集连接的图形。使用SAR,我们报告了最大的全批量GNN培训应用到目前为止,并随着工人数量的增加而展示了大的内存节省。我们还基于内核融合和注意力矩阵的一般技术提出了一种优化了基于关注的模型的运行时和内存效率。我们表明,与SAR相结合,我们的优化注意核导致了基于关注的GNN的显着加速和内存节省。
translated by 谷歌翻译
最近的性能(SOTA)用于图表代表学习(GRL)的性能的改进已经以显着的计算资源要求,例如,用于训练,例如,通过背部计算渐变在许多数据时期。同时,单数值分解(SVD)可以找到闭合形式的解决方案以凸出的问题,仅使用少数时代的时期。在本文中,我们为具有适度硬件的人进行了更多计算贸易。我们设计一个计算\ textit {隐式}定义的矩阵的SVD的框架,并将此框架应用于多个GRL任务。对于每个任务,我们导出了SOTA模型的线性近似,其中我们设计(昂贵 - 存储)矩阵$ \ mathbf {m} $和培训模型,通过$ \ mathbf {m}的svd rend-form,以封闭形式$,无需计算$ \ mathbf {m} $的条目。通过在一个步骤中融合到独特的点,并且在没有计算梯度的情况下,我们的模型在文章引文和生物互动网络等各种图表中显示出具有竞争性的经验测试性能。更重要的是,SVD可以初始化更深入的模型,该模型几乎无处不在地是非线性的,但在其参数驻留在超平面上时,虽然线性地行事,但是在超平面上初始化时,则行为。然后,更深入的模型可以在仅几个时期内进行微调。总的来说,我们的程序比现有技术的方法训练数百次,同时竞争经验测试性能。我们开源我们的实施:https://github.com/samihaija/isvd
translated by 谷歌翻译
用于计算病理(CPATH)的深度分割模型的发展可以帮助培养可解释的形态生物标志物的调查。然而,这些方法的成功存在主要瓶颈,因为监督的深度学习模型需要丰富的准确标记数据。该问题在CPATH领域加剧,因为详细注释的产生通常需要对病理学家的输入能够区分不同的组织构建体和核。手动标记核可能不是收集大规模注释数据集的可行方法,特别是当单个图像区域可以包含数千个不同的单元时。但是,仅依靠自动生成注释将限制地面真理的准确性和可靠性。因此,为了帮助克服上述挑战,我们提出了一种多级注释管道,以使大规模数据集进行用于组织学图像分析,具有病理学家in-循环的细化步骤。使用本市管道,我们生成最大的已知核实例分段和分类数据集,其中包含近百万分之一的H&E染色的结肠组织中标记的细胞核。我们发布了DataSet并鼓励研究社区利用它来推动CPATH中下游小区模型的发展。
translated by 谷歌翻译
Unmanned aerial vehicles (UAVs) mobility enables flexible and customized federated learning (FL) at the network edge. However, the underlying uncertainties in the aerial-terrestrial wireless channel may lead to a biased FL model. In particular, the distribution of the global model and the aggregation of the local updates within the FL learning rounds at the UAVs are governed by the reliability of the wireless channel. This creates an undesirable bias towards the training data of ground devices with better channel conditions, and vice versa. This paper characterizes the global bias problem of aerial FL in large-scale UAV networks. To this end, the paper proposes a channel-aware distribution and aggregation scheme to enforce equal contribution from all devices in the FL training as a means to resolve the global bias problem. We demonstrate the convergence of the proposed method by experimenting with the MNIST dataset and show its superiority compared to existing methods. The obtained results enable system parameter tuning to relieve the impact of the aerial channel deficiency on the FL convergence rate.
translated by 谷歌翻译
Differentiable Search Indices (DSIs) encode a corpus of documents in the parameters of a model and use the same model to map queries directly to relevant document identifiers. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents (+12\%). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting by a significant margin. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.
translated by 谷歌翻译
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models, which seek to decouple model size from computation costs, are becoming an attractive alternative to dense models. Although more efficient in terms of quality and computation cost, sparse models remain data-hungry and costly to train from scratch in the large scale regime. In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet, using only ~50% of the initial dense pretraining sunk cost. The upcycled models also outperform sparse models trained from scratch on 100% of the initial dense pretraining computation budget.
translated by 谷歌翻译
Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyotyping is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date fell short in aberration detection. Using a training set of ~10k patient specimens and ~50k karyograms from over 5 years from the Fred Hutchinson Cancer Center, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. The top-accuracy models utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. TopViT outperformed CNN (Inception) models with >99.3% accuracy for chromosome identification, and exhibited accuracies >99% for aberration detection in most aberrations. Notably, we were able to show high-quality performance even in "few shot" learning scenarios. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). When applied to "zero shot" scenarios, the model captured aberrations without training, with perfect precision at >50% recall. Together these results show that modern deep learning models can approach expert-level performance for chromosome aberration detection. To our knowledge, this is the first study demonstrating the downstream effectiveness of TopViTs. These results open up exciting opportunities for not only expediting patient results but providing a scalable technology for early screening of low-abundance chromosomal lesions.
translated by 谷歌翻译